Skip to content

Add Live Debugger package#4449

Open
watson wants to merge 25 commits intomainfrom
watson/DEBUG-5296/add-live-debugger
Open

Add Live Debugger package#4449
watson wants to merge 25 commits intomainfrom
watson/DEBUG-5296/add-live-debugger

Conversation

@watson
Copy link
Copy Markdown
Collaborator

@watson watson commented Apr 7, 2026

Motivation

Introduce the @datadog/browser-debugger package to enable Live Debugger in browser applications. This gives frontend developers the ability to add log probes to running applications, evaluate conditions, and inspect runtime state — all without redeploying or modifying source code.

Note: This package is intended for internal Datadog use only until validated in production. It follows the same convention as @datadog/browser-core, @datadog/browser-rum-core, and @datadog/browser-worker — published to npm with each release (to keep versions in sync), but excluded from generated documentation via typedoc.json.

Changes

New package: packages/debugger

A new @datadog/browser-debugger package that provides the full probe execution pipeline:

  • domain/api.ts — Core instrumentation hooks (onEntry, onReturn, onThrow) that execute probes when instrumented functions are called, including condition evaluation, snapshot capture, template message rendering, and rate limiting.
  • domain/activeEntries.ts — Tracks per-probe execution stacks for correlating entry/return/throw events, extracted to break the dependency cycle between api.ts and probes.ts.
  • domain/probes.ts — Probe lifecycle management (add, remove, clear) with per-probe and global snapshot rate limiting. Compiles probe conditions and template segments on registration.
  • domain/capture.ts — Deep value capture for arguments, locals, return values, and thrown errors with configurable reference depth, collection size limits, and string length limits.
  • domain/expression.ts — Expression compiler that parses JSON expression trees (comparisons, logical operators, member access, string operations, etc.) into executable functions.
  • domain/condition.ts — Probe condition evaluator that compiles and caches condition expressions.
  • domain/template.ts — Template segment compiler and evaluator for rendering dynamic probe messages with runtime context.
  • domain/stacktrace.ts — Stack trace capture and parsing from Error objects.
  • domain/deliveryApi.ts — Polling-based probe delivery client that fetches probe updates/deletions from the Delivery API using a cursor for incremental sync.
  • transport/startDebuggerBatch.ts — Transport layer that reuses @datadog/browser-core's batch/flush infrastructure to send debugger snapshots to the logs intake.
  • entries/main.ts — Public API surface (datadogDebugger.init()). Exposes $dd_entry/$dd_return/$dd_throw/$dd_probes hooks on globalThis for instrumented code. Defines the global DD_DEBUGGER object.

Changes to @datadog/browser-core

  • Added 'dd_debugger' as a valid source in configuration and transport types, mapped to 'browser' for the SDK source.
  • Exported computeTransportConfiguration and the Batch type so the debugger package can create its own transport.

E2E test framework and scenarios

  • test/e2e/scenario/debugger.scenario.ts — 7 E2E test scenarios covering: basic snapshot sending, argument/return value capture, exception capture on throw, template message evaluation with expression segments, condition evaluation (both met and not met), and RUM correlation.
  • E2E framework extensions — Added .withDebugger() builder method to createTest(), DebuggerIntakeRequest type and intakeRegistry.debuggerEvents for asserting on debugger events, debugger page setups for CDN/bundle/npm modes, and default debugger configuration.
  • test/apps/vanilla/app.ts — Added @datadog/browser-debugger import and DEBUGGER_INIT support so debugger E2E tests work in the npm setup.

Performance benchmarks

  • test/apps/instrumentation-overhead/ — Webpack test app for measuring instrumentation overhead with instrumented vs. uninstrumented function variants.
  • test/performance/scenarios/instrumentationOverhead.scenario.ts — Benchmark scenario that stress-tests 10M function calls to measure the overhead of debugger instrumentation hooks.
  • test/performance/createBenchmarkTest.ts — Extended with instrumented_no_probes and instrumented_with_probes scenario configurations and a dedicated injectDebugger function.

Tooling

  • Updated scripts/build/build-test-apps.ts to include the new test app and to use resolution paths when installing peer dependencies (needed for unpublished packages like @datadog/browser-debugger that only exist locally as .tgz files).
  • Updated scripts/dev-server/lib/server.ts to serve the debugger bundle.
  • Added debugger entry point to ESLint side-effects allowlist.

Test instructions

  1. Unit tests: yarn test:unit --spec "packages/debugger/src/**/*.spec.ts"
  2. E2E tests: yarn test:e2e:init && yarn test:e2e -g "debugger"
  3. Performance testing: yarn build:apps && yarn test:performance

Checklist

  • Tested locally
  • Tested on staging — N/A, this is a new pre-production package not yet deployed to any environment
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.
  • Updated documentation and/or relevant AGENTS.md file

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Copy Markdown
Collaborator Author

watson commented Apr 7, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented Apr 7, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 72.23%
Overall Coverage: 76.69% (-0.32%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 038eb97 | Docs | Datadog PR Page | Give us feedback!

@watson watson force-pushed the watson/DEBUG-5296/add-live-debugger branch from 4df303f to b539c14 Compare April 7, 2026 13:30
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented Apr 7, 2026

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 179.65 KiB 179.69 KiB +36 B +0.02%
Rum Profiler 6.17 KiB 6.17 KiB 0 B 0.00%
Rum Recorder 27.03 KiB 27.03 KiB 0 B 0.00%
Logs 56.78 KiB 56.81 KiB +36 B +0.06%
Rum Slim 135.50 KiB 135.54 KiB +36 B +0.03%
Worker 23.63 KiB 23.63 KiB 0 B 0.00%
🚀 CPU Performance
Action Name Base CPU Time (ms) Local CPU Time (ms) 𝚫%
RUM - add global context 0.0039 0.0063 +61.54%
RUM - add action 0.0131 0.0182 +38.93%
RUM - add error 0.0117 0.0161 +37.61%
RUM - add timing 0.0025 0.0029 +16.00%
RUM - start view 0.0119 0.0122 +2.52%
RUM - start/stop session replay recording 0.0006 0.0007 +16.67%
Logs - log message 0.0138 0.0164 +18.84%
🧠 Memory Performance
Action Name Base Memory Consumption Local Memory Consumption 𝚫
RUM - add global context 31.78 KiB 30.66 KiB -1.12 KiB
RUM - add action 56.94 KiB 55.73 KiB -1.21 KiB
RUM - add timing 32.72 KiB 33.19 KiB +475 B
RUM - add error 60.80 KiB 58.17 KiB -2.63 KiB
RUM - start/stop session replay recording 32.10 KiB 32.67 KiB +576 B
RUM - start view 484.29 KiB 483.68 KiB -625 B
Logs - log message 95.66 KiB 98.58 KiB +2.92 KiB

🔗 RealWorld

@watson
Copy link
Copy Markdown
Collaborator Author

watson commented Apr 7, 2026

I have read the CLA Document and I hereby sign the CLA

@watson watson force-pushed the watson/DEBUG-5296/add-live-debugger branch 6 times, most recently from 3042bfe to 98490f7 Compare April 7, 2026 15:45
@watson watson marked this pull request as ready for review April 7, 2026 15:56
@watson watson requested a review from a team as a code owner April 7, 2026 15:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98490f775f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/debugger/src/domain/capture.ts Outdated
Comment thread packages/debugger/src/domain/api.ts Outdated
Comment thread packages/debugger/src/domain/api.ts Outdated
Comment on lines +20 to +23
const hasReplaceAll = typeof (String.prototype as any).replaceAll === 'function'
const replaceDots = hasReplaceAll
? (str: string) => (str as string & { replaceAll: (s: string, r: string) => string }).replaceAll('.', '_')
: (str: string) => str.replace(/\./g, '_')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const hasReplaceAll = typeof (String.prototype as any).replaceAll === 'function'
const replaceDots = hasReplaceAll
? (str: string) => (str as string & { replaceAll: (s: string, r: string) => string }).replaceAll('.', '_')
: (str: string) => str.replace(/\./g, '_')
const replaceDots = (str: string) => str.replace(/\./g, '_')

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this implementation for performance reasons, as I assumed that in some browsers, it might be faster to use replaceAll over replace with a regular expression. What is your reason for wanting to simplify this?

let fn = functionCache.get(cacheKey)
if (!fn) {
// eslint-disable-next-line no-new-func, @typescript-eslint/no-implied-eval
fn = new Function(...contextKeys, fnBody) as (...args: any[]) => boolean
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: eval is evil, we shouldn't rely on eval if possible. Best practice is often to use CSP that block such functionality.

Is there something specific in the language that requires "inline" JavaScript code? Else we can probably parse its AST and run it using JS without actually generating JS code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know 😅 This is unfortunately the best way I've found to make this code performant. The input to new Function() is only coming from the probe definitions which are fetched from the debugger-delivery-api service in dd-source. Users of the website should not have access to manipulate this input. I previously brought this up with the security team and they have given 👍

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background: Probes can have conditions that need to be evaluated at runtime to know if the probe should trigger or not. These conditions are defined in the probe definition in a custom DSL. This DSL needs to be translated to JavaScript so we can execute it every time the instrumented method is called. That's why it needs to be performant.


declare const __BUILD_ENV__SDK_VERSION__: string

const DELIVERY_API_PATH = '/api/ui/debugger/probe-delivery'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what is this route? Is this supposed to live on the customer website?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a temporary path, that will only work when running inside of web-ui. I have a follow-up PR to generalize it: #4480

I on purpose didn't include it in this PR because it was already in review by the time I made that change. If you prefer, I can merge that PR into this one?

Comment thread packages/debugger/package.json Outdated
Comment thread test/performance/createBenchmarkTest.ts Outdated
Comment thread test/e2e/scenario/debugger.scenario.ts Outdated
createTest('send debugger snapshot when instrumented function is called')
.withDebugger()
.run(async ({ intakeRegistry, flushEvents, page, browserName, servers }) => {
test.skip(browserName !== 'chromium', 'Debugger tests require Chromium')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question why debugger tests require Chromium?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I can't remember why I did this. I remember running into some issue but I'm not sure anymore. I'll try to re-enable it and see what happens.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work fine on all browsers 👍

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong. Firefox has some issues: https://gitlab.ddbuild.io/DataDog/browser-sdk/-/jobs/1664026268

I'll investigate

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a commit to try and increase the timeouts: 038eb97

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is related to the performance degradation we know exists for Firefox 🤔 (as Firefox can't optimize the functions as well as other JavaScript engines can)

Comment thread test/e2e/lib/framework/intakeProxyMiddleware.ts Outdated
@watson watson force-pushed the watson/DEBUG-5296/add-live-debugger branch from 8130067 to 626668e Compare May 7, 2026 05:31
watson added 24 commits May 7, 2026 11:27
Introduce the browser debugger SDK and probe execution pipeline so
browser code can evaluate conditions, capture snapshots, and render
probe messages at runtime. Add Delivery API polling plus sandbox and
performance tooling to support probe delivery and testing.
Use the existing debugger service configuration when polling for probes so
browser callers keep sending a valid delivery identity after the
applicationId rollback. This also removes the stale applicationId init
examples and test fixtures from the debugger package.

Made-with: Cursor
Previously Node.js was the only one using this
Ensure we don't publish it while WIP
In the browser there's no trace or span id
Replace the hand-rolled `$dd_*` stub hooks with the actual published SDK
so the benchmark measures real per-call instrumentation overhead, not
the cost of counter-incrementing stubs. With the stubs the
`instrumented_with_probes` configuration came in at roughly the same
cost as `none`; with the real SDK it's ~1.65 µs per call, which is the
number we actually want to track.

For the measurement to be statistically sound the SDK has to be fully
ready before the warmup loop runs, otherwise V8 JIT-optimizes against
an intermediate `$dd_probes`-undefined shape and then deopts mid-flight
once probes appear. To get there:

- Inline the built debugger bundle via `addInitScript({ path })` so
  `DD_DEBUGGER` is defined before the test app's script tag executes.
- Mock the SDK's same-origin probe-delivery endpoint on the perf
  server, routing off the request body's `service` field so parallel
  benchmark workers stay isolated.
- Gate the scenario's warmup on a `__benchmarkReady` flag that the
  injector flips only after `init()` returns and (for
  `instrumented_with_probes`) the first poll has populated the registry.

The probe used for `instrumented_with_probes` is a typical low-impact
`LOG_PROBE` (`captureSnapshot: false`, `snapshotsPerSecond: 1`), so
measurement stays focused on probe-lookup + sampling-check cost rather
than intake traffic. The SDK's `pollInterval` is set to one day so
re-polls can't perturb the measurement window.
Previously, the Debugger SDK set `source: 'dd_debugger'` on its init
configuration so it would flow into the `ddsource` URL parameter, which
forced `validateAndBuildConfiguration` to convert it back to 'browser'
via a `toSdkSource` helper before storing on `Configuration.source`
(used as the SDK source on RUM events). The round-trip was confusing
because the same field was playing two roles: URL routing source and
RUM event SDK source.

Split them apart:

- `InitConfiguration.source` no longer accepts 'dd_debugger'.
- `computeTransportConfiguration` takes an optional `sourceOverride`
  parameter (typed `TransportSource`, includes 'dd_debugger') used only
  for URL building. The Debugger SDK passes 'dd_debugger' there.
- Function overloads narrow the return type when no override is given,
  so `Configuration.source` is `SdkSource` without a runtime conversion.
- `toSdkSource` is removed.

Wire behavior is unchanged: URLs still go out with
`?ddsource=dd_debugger&...&dd-evp-origin=browser`, RUM events still
carry a valid `SdkSource`, and downstream `source:dd_debugger` queries
keep working.
@watson watson force-pushed the watson/DEBUG-5296/add-live-debugger branch from ce9f93d to 09376b1 Compare May 7, 2026 09:28
BrowserStack Firefox can pause long enough during full-suite unit runs
to miss Karma's default heartbeat and no-activity thresholds. This makes
the job fail as a disconnect even though the browser may still be
running the suite.

Increase the BrowserStack-only limits from Karma's defaults:
- browserDisconnectTimeout: 2s -> 30s
- pingTimeout: 60s -> 120s
- browserNoActivityTimeout: 60s -> 120s

Keep the change scoped to BrowserStack so local unit runs still use the
stricter base config.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants